AITopics | Hays County

Collaborating Authors

Hays County

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

Jiang, Ziyan, Meng, Rui, Yang, Xinyi, Yavuz, Semih, Zhou, Yingbo, Chen, Wenhu

arXiv.org Artificial IntelligenceJan-2-2025

Embedding models have been crucial in enabling various downstream tasks such as semantic similarity, information retrieval, and clustering. Recently, there has been a surge of interest in developing universal text embedding models that can generalize across tasks (e.g., MTEB). However, progress in learning universal multimodal embedding models has been relatively slow despite its importance and practicality. In this work, we aim to explore the potential of building universal multimodal embeddings capable of handling a wide range of downstream tasks. Our contributions are two fold: (1) we propose MMEB (Massive Multimodal Embedding Benchmark), which covers 4 meta-tasks (i.e. We show that VLMs are secretly strong embedding models. Embeddings, or distributed representations, encode inputs (whether text or images) as fixed-dimensional vectors, enabling a range of downstream tasks. A recent shift in research has focused on developing universal embeddings that can generalize across a wide range of tasks. For instance, Muennighoff et al. (2023) introduced MTEB (Massive Text Embedding Benchmark) to comprehensively assess text embeddings across tasks such as classification and clustering. MTEB has become the standard for evaluating universal text embeddings. Recent works (Wang et al., 2022a; Su et al., 2023; Wang et al., 2024; Springer et al., 2024; BehnamGhader et al., 2024) have demonstrated promising results on the MTEB benchmark. However, progress in multimodal embeddings has been relatively slower. Work done during an internship at University of Waterloo in collaboration with Salesforce Research. Instruction: Represent the given news image with the Instruction: Represent the given image and the following caption for domain classification.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.0516

Country:

Europe (1.00)
North America > Canada (0.67)
North America > United States > Texas > Hays County > San Marcos (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports > Tennis (0.68)
Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Wei, Cong, Chen, Yang, Chen, Haonan, Hu, Hexiang, Zhang, Ge, Fu, Jie, Ritter, Alan, Chen, Wenhu

arXiv.org Artificial IntelligenceNov-28-2023

Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image. To approach such different information-seeking demands, we introduce UniIR, a unified instruction-guided multimodal retriever capable of handling eight distinct retrieval tasks across modalities. UniIR, a single retrieval system jointly trained on ten diverse multimodal-IR datasets, interprets user instructions to execute various retrieval tasks, demonstrating robust performance across existing datasets and zero-shot generalization to new tasks. Our experiments highlight that multi-task training and instruction tuning are keys to UniIR's generalization ability. Additionally, we construct the M-BEIR, a multimodal retrieval benchmark with comprehensive results, to standardize the evaluation of universal multimodal information retrieval.

information retrieval, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2311.17136

Country:

Asia (1.00)
Europe > Germany (0.94)
Europe > United Kingdom (0.67)
(2 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Regional Government > Europe Government (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation

Saha, Swarnadeep, Yu, Xinyan Velocity, Bansal, Mohit, Pasunuru, Ramakanth, Celikyilmaz, Asli

arXiv.org Artificial IntelligenceDec-16-2022

Prompting large language models has enabled significant recent progress in multi-step reasoning over text. However, when applied to text generation from semi-structured data (e.g., graphs or tables), these methods typically suffer from low semantic coverage, hallucination, and logical inconsistency. We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning. MURMUR is a best-first search method that generates reasoning paths using: (1) neural and symbolic modules with specific linguistic and logical skills, (2) a grammar whose production rules define valid compositions of modules, and (3) value functions that assess the quality of each reasoning step. We conduct experiments on two diverse data-to-text generation tasks like WebNLG and LogicNLG. These tasks differ in their data representations (graphs and tables) and span multiple linguistic and logical skills. MURMUR obtains significant improvements over recent few-shot baselines like direct prompting and chain-of-thought prompting, while also achieving comparable performance to fine-tuned GPT-2 on out-of-domain data. Moreover, human evaluation shows that MURMUR generates highly faithful and correct reasoning paths that lead to 26% more logically consistent summaries on LogicNLG, compared to direct prompting.

artificial intelligence, module, natural language, (15 more...)

arXiv.org Artificial Intelligence

2212.08607

Country:

Europe (1.00)
North America > United States > Texas > Hays County (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

An Inside Look at Apple's Biggest Step Yet in Health Care

TIME - TechDec-6-2018, 22:44:39 GMT

Captain America and Black Panther were about to defend Earth from the villain Thanos when Kevin Foley first noticed something was wrong. Foley, a 46-year-old information-technology worker from Kyle, Texas, was heading into the theater to see Avengers: Infinity War when he realized he was having trouble breathing normally. The sensation struck again during another movie the following night, but more severe this time. Once the credits on the second film rolled, Foley took action: he looked at his wristwatch. It was a bigger step than you might imagine, because Foley was wearing an Apple Watch equipped with medical sensors and experimental software to track basic functions of his heart. And the watch was worried.

apple, cardiology, vascular disease, (22 more...)

TIME - Tech

Country: North America > United States > Texas > Hays County > Kyle (0.24)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
(4 more...)

Technology:

Information Technology > Security & Privacy (0.94)
Information Technology > Artificial Intelligence (0.69)
Information Technology > Hardware (0.65)
Information Technology > Communications > Mobile (0.31)

Add feedback